Distributed Human Computation Framework for Linked Data Co-reference Resolution
نویسندگان
چکیده
Distributed Human Computation (DHC) is used to solve computational problems by incorporating the collaborative effort of a large number of humans. It is also a solution to AI-complete problems such as natural language processing. The Semantic Web with its root in AI has many research problems that are considered as AI-complete. E.g. co-reference resolution, which involves determining whether different URIs refer to the same entity, is a significant hurdle to overcome in the realisation of large-scale Semantic Web applications. In this paper, we propose a framework for building a DHC system on top of the Linked Data Cloud to solve various computational problems. To demonstrate the concept, we are focusing on handling the co-reference resolution when integrating distributed datasets. Traditionally machine-learning algorithms are used as a solution for this but they are often computationally expensive, error-prone and do not scale. We designed a DHC system named iamResearcher, which solves the scientific publication author identity coreference problem when integrating distributed bibliographic datasets. In our system, we aggregated 6 million bibliographic data from various publication repositories. Users can sign up to the system to audit and align their own publications, thus solving the co-reference problem in a distributed manner. The aggregated results are dereferenceable in the Open Linked Data Cloud.
منابع مشابه
Computing Identity Co-Reference Across Drug Discovery Datasets
This paper presents the rules used within the Open PHACTS (http://www.openphacts.org) Identity Management Service to compute co-reference chains across multiple datasets. The web of (linked) data has encouraged a proliferation of identifiers for the concepts captured in datasets; with each dataset using their own identifier. A key data integration challenge is linking the co-referent identifier...
متن کاملManaging Co-reference on the Semantic Web
Co-reference resolution, or the determination of ‘equivalent’ URIs referring to the same concept or entity, is a significant hurdle to overcome in the realisation of large scale Semantic Web applications. However, it has only recently gained the attention of research communities in the Semantic Web context, and while activities are now underway in identifying co-referent or conflated URIs, litt...
متن کاملA rule based solution to co-reference resolution in clinical text
OBJECTIVE To build an effective co-reference resolution system tailored to the biomedical domain. METHODS Experimental materials used in this study were provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves co-reference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each doc...
متن کاملFive Stars of Linked Data Vocabulary Use Editorial
In 2010 Tim Berners-Lee introduced a 5 star rating to his Linked Data design issues page to encourage data publishers along the road to good Linked Data. What makes the star rating so effective is its simplicity, clarity, and a pinch of psychology – is your data 5 star? While there is an abundance of 5 star Linked Data available today, finding, querying, and integrating/interlinking these data ...
متن کاملParallelizing Irregular Applications through the YAPPA Compilation Framework
Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous tra...
متن کامل